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Abstract 

We consider a class of operator-induced norms, acting as finite-dimensional 
surrogates to the L 2 norm, and study their approximation properties over 
Hilbert subspaces of L 2 . The class includes, as a special case, the usual em- 
pirical norm encountered, for example, in the context of nonparametric re- 
gression in reproducing kernel Hilbert spaces (RKHS). Our results have impli- 
cations to the analysis of M-estimators in models based on finite-dimensional 
linear approximation of functions, and also to some related packing problems. 
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1. Introduction 

Given a probability measure P supported on a compact set X C M. d , 
consider the function class 

L 2 (F):={f:X^R | ||/|| L2(P) < 00}, (1) 

where ||/|U 2 (P) := \J J x f 2 (x) dP(x) is the usual L 2 nor nfl defined with re- 
spect to the measure P. It is often of interest to construct approximations 
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1 We also use L 2 (X) or simply L 2 to refer to the space (TT]), with corresponding conven- 
tions for its norm. Also, one can take X to be a compact subset of any separable metric 
space and P a (regular) Borel measure. 
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to this L 2 norm that are "finite-dimensional" in nature, and to study the 
quality of approximation over the unit ball of some Hilbert space % that 
is continuously embedded within L 2 . For example, in approximation theory 
and mathematical statistics, a collection of n design points in X is often used 
to define a surrogate for the L 2 norm. In other settings, one is given some or- 
thonormal basis of L 2 (F), and defines an approximation based on the sum of 
squares of the first n (generalized) Fourier coefficients. For problems of this 
type, it is of interest to gain a precise understanding of the approximation 
accuracy in terms of its dimension n and other problem parameters. 

The goal of this paper is to study such questions in reasonable generality 
for the case of Hilbert spaces "H. We let $ n : H — > K n denote a continuous 
linear operator on the Hilbert space, which acts by mapping any / 6 TL to 
the n-vector ([<& n /]i l^nfh •" [®nf]n)- This operator defines the $ n - 
semi-norm 



\ 



£[*»/]?■ (2) 



i=i 



In the sequel, with a minor abuse of terminologyjl we refer to ||/||$ n as the 
$ n -norm of /. Our goal is to study how well ||/||* n approximates ||/||l 2 over 
the unit ball of H as a function of n, and other problem parameters. We 
provide a number of examples of the sampling operator $ n in Section 12.21 
Since the dependence on the parameter n should be clear, we frequently omit 
the subscript to simplify notation. 

In order to measure the quality of approximation over T-t, we consider the 
quantity 

ik(£):=sup{||/|& | feBn, ll/lll <e 2 }, (3) 

where Bu := {/ G H \ \\f\\n < 1} is the unit ball of H. The goal of 
this paper is to obtain sharp upper bounds on _R$. As discussed in Ap- 



pendix Appendix C a relatively straightforward argument can be used to 



translate such upper bounds into lower bounds on the related quantity 

!„(£):= inf {ll/lll | feB n , ||/||| 2 > e 2 }. (4) 



2 This can be justified by identifying / and g if $/ = $g, i.e. considering the quotient 
"H/ker$. 
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We also note that, for a complete picture of the relationship between the 
semi-norm || • ||$ and the L 2 norm, one can also consider the related pair 

r„(e):=sup{||/|ft I / e B H , ||/||| 2 < e 2 }, and (5a) 
R»(e) := inf {||/|& | /£% ||/||| >e 2 }. (5b) 

Our methods are also applicable to these quantities, but we limit our treat- 
ment to (.R$,T$) so as to keep the contribution focused. 

Certain special cases of linear operators $, and associated functionals 
have been studied in past work. In the special case e = 0, we have 

i?*(0)=sup{||/|| 2 2 | / E B n , $(/) = 0}, 

a quantity that corresponds to the squared diameter of B n D Ker(<3>), mea- 
sured in the L 2 -norm. Quantities of this type are standard in approximation 
theory (e.g., [l|, 0, ) , for instance in the context of Kolmogorov and Gelfand 
widths. Our primary interest in this paper is the more general setting with 
e > 0, for which additional factors are involved in controlling R$(e). In 
statistics, there is a literature on the case in which $ is a sampling operator, 
which maps each function / to a vector of n samples, and the norm || • ||$ 
corresponds to the empirical L 2 -norm defined by these samples. When these 
samples are chosen randomly, then techniques from empirical process the- 
ory [4| can be used to relate the two terms. As discussed in the sequel, our 
results have consequences for this setting of random sampling. 

As an example of a problem in which an upper bound on R§ is useful, let 
us consider a general linear inverse problem, in which the goal is to recover 
an estimate of the function /* based on the noisy observations 

V% = [$/*]* + Wi, i = l,...,n, 

where {uii} are zero-mean noise variables, and /* G B-^ is unknown. An 
estimate / can be obtained by solving a least-squares problem over the unit 
ball of the Hilbert space — that is, to solve the convex program 

n 

f:= arg min VV^ - [$/] 4 ) 2 . 
i=i 

For such estimators, there are fairly standard techniques for deriving upper 
bounds on the <3>-semi-norm of the deviation f — f*. Our results in this paper 
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on can then be used to translate this to a corresponding upper bound on 
the L 2 -norm of the deviation / — /*, which is often a more natural measure 
of performance. 

As an example where the dual quantity T$ might be helpful, consider the 
packing problem for a subset V C of the Hilbert ball. Let M(e; V, \\ ■ \\ L 2) 
be the e-packing number of T> in || • \\ L 2, i.e., the maximal number of function 
fx,... ,fu £ T> such that \\fi~ fj\\L 2 > e f° r & U i>J = 1, • • • , M. Similarly, let 
M{e;T>, || • ||$) be the ^-packing number of P in || • ||$ norm. Now, suppose 
that for some fixed s, T$(e) > 0. Then, if we have a collection of functions 
{/i, . . . , /m} which is an ^-packing of V in || • ||^ norm, then the same 
collection will be a a/ T<s> (e)-packing of T> in || • ||$. This implies the following 
useful relationship between packing numbers 

M(e ;V,\\ ■ \\ L2 ) < M(^T^T) ;V,\\ ■ U). 

The remainder of this paper is organized as follows. We begin in Section [2] 
with background on the Hilbert space set-up, and provide various examples 
of the linear operators $ to which our results apply. Section [3] contains the 
statement of our main result, and illustration of some its consequences for 
different Hilbert spaces and linear operators. Finally, Section H] is devoted to 
the proofs of our results. 

Notation:. For any positive integer p, we use to denote the cone of p x p 
positive semidefinite matrices. For A, B e S+, we write A >z B or B -< A to 
mean A — B G §+. For any square matrix A, let A min (v4) and A max (A) denote 
its minimal and maximal eigenvalues, respectively. We will use both \J~A~ and 
A 1 ! 2 to denote the symmetric square root of A e §+. We will use {xk} = 
{ x k}kLi to denote a (countable) sequence of objects (e.g. real-numbers and 
functions). Occasionally we might denote an n-vector as . . . ,x n }. The 
context will determine whether the elements between braces are ordered. The 
symbols £ 2 = ^(N) are used to denote the Hilbert sequence space consisting 
of real-valued sequences equipped with the inner product ({xk}, {yk})e 2 '■ = 
Y^k=i x iVi- The corresponding norm is denoted as || • ||<? 2 . 

2. Background 

We begin with some background on the class of Hilbert spaces of interest 
in this paper and then proceed to provide some examples of the sampling 
operators of interest. 
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2.1. Hilbert spaces 

We consider a class of Hilbert function spaces contained within L 2 (X), 
and defined as follows. Let {ipk}kLi De an orthonormal sequence (not nec- 
essarily a basis) in L 2 (X) and let cr 1 > cr 2 > cr 3 > • - - > be a sequence 
of positive weights decreasing to zero. Given these two ingredients, we can 
consider the class of functions 

oo 

H := [f G L 2 (P) | / = V^^k, for some {a k }^ =l G £ 2 (N)}, (6) 

k=l 

where the series in ([6]) is assumed to converge in L 2 . (The series con- 
verges since Y^k=i(\/~Ok a k) 2 < "llK^nJII-fo < °°-) We refer to the sequence 
{ctk}'j? = i G £2 as the representative of /. Note that this representation is 
unique due to being strictly positive for all k G N. 

If / and g are two members of H, say with associated representatives 
a = {afc}^ and = {(3k}'kLi, then we can define the inner product 

00 

(f,g)H-=^2®kPk = (a,P)t a - (7) 
fc=l 

With this choice of inner product, it can be verified that the space H is a 
Hilbert space. (In fact, T-L inherits all the required properties directly from 
£2-) For future reference, we note that for two functions f,g G T-L with 
associated representatives a, j3 G £2, their L 2 -based inner product is given 

We note that each ip k is in H, as it is represented by a sequence with a 
single nonzero element, namely, the fc-th element which is equal to al 1 ^ 2 . It 
follows from (J7J) that (y / ov0fc> y/^j'4'j}n — <5jfcj- That is, {^/cr^ipk} is an or- 
thonormal sequence in %. Now, let f EH be represented by a G £2- We claim 
that the series in (jSJ) also converges in "H norm. In particular, J2k=i \f^k° l k'i ) k 
is in H, as it is represented by the sequence {«i, . . . , a^, 0, 0, . . . } G £2- It 
follows from §7§ that ||/ - J2k=i \f^k a k^k\n = YI^n+i ^ which converges 
to as N — > 00. Thus, {•x/fTfcV'ifc} is in fact an orthonormal basis for %. 



3 In particular, for / G U, ||/||l 2 < v^H^H^ which shows that the inclusion W C L 2 
is continuous. 



5 



We now turn to a special case of particular importance to us, namely the 
reproducing kernel Hilbert space (RKHS) of a continuous kernel. Consider 
a symmetric bivariate function K : X x X — > IR, where X C W 1 is compactQ. 
Furthermore, assume K to be positive semidefinite and continuous. Consider 
the integral operator Jk mapping a function / G L 2 to the function Ik/ := 
J K(-, y)f(y)d¥(y). As a consequence of Mercer's theorem a a, I K is a 
compact operator from L 2 to C(X\ the space of continuous functions on 
X equipped with the uniform norrro Let {&k} be the sequence of nonzero 
eigenvalues of Ik, which are positive, can be ordered in nonincreasing order 
and converge to zero. Let {ipk} be the corresponding eigenfunctions which are 
continuous and can be taken to be orthonormal in L 2 . With these ingredients, 
the space % defined in equation (E]) is the RKHS of the kernel function K. 
This can be verified as follows. 

As another consequence of the Mercer's theorem, K has the decomposition 

oo 

K (z>y) := ^2°kipk(x)il)k(y) (8) 
fc=i 

where the convergence is absolute and uniform (in x and y). In partic- 
ular, for any fixed y G X, the sequence {y/cfkipkiy)} is in £ 2 - (In fact, 
^"kLiiy/^k^kiy)) 2 = ^(y,y) < oo.) Hence, K(-,y) is in "H, as defined in (J6|), 
with representative {yfokipk (y)}- Furthermore, it can be verified that the con- 
vergence in ([6]) can be taken to be also pointwis^]. To be more specific, for any 
/ G U with representative {a k }™ =1 G £ 2 , we have f(y) = Y^T=i V°k~ a k4>k(y), 
for all y G X. Consequently, by definition of the inner product ([7]), we have 

oo 

(/,K(-,y))« = ^2a k y/okil>k(y) = f(y), 

k=l 

so that K(-,y) acts as the representer of evaluation. This argument shows 
that for any fixed y G X, the linear functional on % given by / i — f(y) is 



4 Also assume that P assign positive mass to every open Borel subset of X. 
5 In fact, Ik is well denned over L 1 D L 2 and the conclusions about Ik hold as a operator 
from L 1 to C{X). 

6 The convergence is actually even stronger, namely it is absolute and uniform, as can 
be seen by noting that ELn+i Kv^My)! < (IXrc+i «fe) 1/2 (Xr=„+i ^Uy)) 112 < 
(EL+i «fc) 1/2 maxaeA- k(y, y). 
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bounded, since we have 

|/(y)| = |(/,K(.,y))„| < ||/||«||K(-,y)||«, 

hence H is indeed the RKHS of the kernel IK. This fact plays an important 
role in the sequel, since some of the linear operators that we consider involve 
point wise evaluation. 

A comment regarding the scope: our general results hold for the basic 
setting introduced in equation fl6]). For those examples that involve pointwise 
evaluation, we assume the more refined case of the RKHS described above. 

2.2. Linear operators, semi-norms and examples 

Let $ : H — > M. n be a continuous linear operator, with co-ordinates 
for i = 1, 2, . . . , n. It defines the (semi)-inner product 

(/,<7>* :=<*/, (9) 

which induces the semi-norm || • ||$. By the Riesz representation theorem, 
for each i = 1, . . . ,n, there is a function ipi E H such that = (<fi, f)u 

for any f E H. 

Let us illustrate the preceding definitions with some examples. 

Example 1 (Generalized Fourier truncation). Recall the orthonormal basis 
{^iji^i underlying the Hilbert space. Consider the linear operator T^n : 
% — > MJ 1 with coordinates 

[Fwf]i : =y>i>f)v> fori = l,2,...,n. (10) 

We refer to this operator as the (generalized) Fourier truncation operator, 
since it acts by truncating the (generalized) Fourier representation of / to its 
first n co-ordinates. More precisely, by construction, if / = J2T=i V^k a kipk, 
then 

[$f]i = y/aioLi, for i = l, 2,..., n. (11) 

By definition of the Hilbert inner product, we have on = (ipi, f)u, so that we 
can write = {(p h f) n , where := ^fa^i. 
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Example 2 (Domain sampling). A collection x™ := {x x , . . . ,x n } of points 
in the domain X can be used to define the (scaled) sampling operator S x ™ : 
U K n via 

S*»/:= n' 1 ' 2 (f{x x ) ... f(x n )), for fen. (12) 

As previously discussed, when % is a reproducing kernel Hilbert space (with 
kernel K), the (scaled) evaluation functional / i— >■ n~ x l 2 f(xi) is bounded, and 
its Riesz representation is given by the function Lpi = n~ 1//2 K(-, Xi). (} 

Example 3 (Weighted domain sampling). Consider the setting of the pre- 
vious example. A slight variation on the sampling operator ( jl2j) is obtained 
by adding some weights to the samples 

W^n/ ;= n - 1 ' 2 ... w n f(x n )), for / e H. (13) 

where to™ = (wi,...,w n ) is chosen such that Y^k=i w l = 1- Clearly, tpi = 
n _1 / 2 WjK(-,a; i ). 

[As an example of how this might arise, consider approximating f(t) by 
Sfc=i f( x k)G n (t,Xk) where {G n (- ,Xk)} is a collection of functions in L 2 (X) 
such that (G n (- ,Xk),G n (- ,Xj))& = n^ 1 wl5kj. Proper choices of {G n (-,Xi)} 
might produce better approximations to the L 2 norm in the cases where one 
insists on choosing elements of x" to be uniformly spaced, while P in (pQ) is not 
a uniform distribution. Another slightly different but closely related case is 
when one approximates f 2 (t) over X = [0, 1], by say n^ 1 Ylk=i f 2 ( x k)W(n(t— 
Xk)) for some function W : [—1, 1] — > R + and Xk = k/n. Again, non-uniform 
weights are obtained when P is nonuniform.] 



3. Main result and some consequences 

We now turn to the statement of our main result, and the development 
of some its consequences for various models. 

3.1. General upper bounds on R^(e) 

We now turn to upper bounds on R§(e) which was defined previously 
in Our bounds are stated in terms of a real- valued function defined as 
follows: for matrices D,M G §+, 

£{t,M,D) :=max\x max (D-tVDMVD), ol, for t > 0. (14) 
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Here y/~D denotes the matrix square root, valid for positive semi definite ma- 
trices. 



The upper bounds on R$(e) involve principal submatrices of certain 
infinite-dimensional matrices — or equivalently linear operators on ^(N)— 
that we define here. Let \& be the infinite-dimensional matrix with entries 

[*] ifc := (if>j,i{>k)*, for j, k = 1,2,..., (15) 

and let £ = diag{ci, cr 2 , . . . , } be a diagonal operator. For any p — 1, 2, . . ., 
we use ty p and to denote the principal submatrices of $ on rows and 
columns indexed by {1, 2, . . . , p} and {p + 1, p + 2, . . . }, respectively. A 
similar notation will be used to denote submatrices of S. 

Theorem 1. For all e > 0, we have: 

Jfe(e) < tof inf S p ) + f (e + yj^pfvpf)) 2 + <t p+1 }. 

(16) 

Moreover, for any peN sitc/i £/ia£ A min ( l I / p) > 0, we /iawe 

«.(.) < (l - ^) 1"^ + A.»-(4 ,2 *^f )) 2 + . P+ , (17) 

Remark (a):. These bounds cannot be improved in general. This is most 
easily seen in the special case e = 0. Setting p — n, bound (fT7|) implies that 
-R$(0) < cr n+ i whenever ^ is strictly positive definite and = 0. This 
bound is sharp in a "minimax sense" , meaning that equality holds if we take 
the infimum over all bounded linear operators $ : % — > R n . In particular, it 
is straightforward to show that 

inf i? (O) = inf sup {\\f\\ 2 L2 | $/ = 0} = a n+1 , (18) 

<E> surjective $ surjective ^ 

and moreover, this infimum is in fact achieved by some linear operator. Such 
results are known from the general theory of n- widths for Hilbert spaces (e.g., 
see Chapter IV in Pinkus [2] and Chapter 3 of [3].) 

In the more general setting of e > 0, there are operators for which the 
bound (fTTj) is met with equality. As a simple illustration, recall the (gen- 
eralized) Fourier truncation operator T^™ from Example [TJ First, it can be 
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Figure 1: Geometry of Fourier truncation. The plot shows the set {(||/||i2, ||/||#) 
1} C K 2 for the case of (generalized) Fourier truncation operator T^,». 



l/li 



H 



< 



verified that (ip k ^j)i^ = S jk for j,k < n and (i/j k ,i>j)r^n = otherwise. 
Taking p = n, we have \l/ n = I n , that is, the n-by-n identity matrix, and 
— 0. Taking p = n in f|T7j) . it follows that for e 2 < a%, 



7?T 



< 



1 - 



0"! 



(19) 



As shown in Appendix Appendix E , the bound (fT9l) in fact holds with equal- 
ity. In other words, the bounds of Theorems [1] are tight in this case. Also, 
note that ( fT9l) implies Rr, n (0) < cr„+i showing that the (generalized) Fourier 



truncation operator achieves the minimax bound of ( fT8j) . Fig [T] provides a 
geometric interpretation of these results. 



1/2 



Remark (b ):. In general, it might be difficult to obtain a bound on A max (S~ 
as it involves the infinite dimensional matrix One may obtain a simple 
(although not usually sharp) bound on this quantity by noting that for a pos- 
itive semi definite matrix, the maximal eigenvalue is bounded by the trace, 
that is, 

A max (£f %£f ) < tr ( E f% E f) = E ( 2 °) 

k >p 



Another relatively easy-to-handle upper bound is 



il/2 



V 



,1/2,, 



supj^ 

k>p 



kr 



(21) 



r>p 
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These bounds can be used, in combination with appropriate block partition- 

1 /2 1 /2 

ing of S~ ^pSp , to provide sharp bounds on the maximal eigenvalue. Block 
partitioning is useful due to the following: for a positive semidefinite matrix 
M = ^ Y we have A max (M) < A max (Ai) + A max (A 2 ). We leave the the 
details on the application of these ideas to examples in Section 13.21 

3.2. Some illustrative examples 

Theorem [T] has a number of concrete consequences for different Hilbert 
spaces and linear operators, and we illustrate a few of them in the following 
subsections. 

3.2.1. Random domain sampling 

We begin by stating a corollary of Theorem [TJ in application to random 
time sampling in a reproducing kernel Hilbert space (RKHS). Recall from 
equation ffl2|) the time sampling operator § x «, and assume that the sample 
points {xi, . . . ,x n } are drawn in an i.i.d. manner according to some distri- 
bution IP on A\ Let us further assume that the eigenfunctions ipk, k > 1 are 
uniformly bounded^ on X, meaning that 

sup sup \ipk{x)\ < CV ( 22 ) 
fc>i xex 

Finally, we assume that ||er||i := J^fcLi a k < °°; an d that 

° P k < C a a k o~ p , for some positive constant C a and for all large p, (23) 
J2k> P m a k < &p, for some positive integer m and for all large p. (24) 

Let m a be the smallest m for which (1241) holds. These conditions on {o~k} are 
satisfied, for example, for both a polynomial decay = 0(k~ a ) with a > 1 
and an exponential decay a k = 0(p k ) with p 6 (0, 1). In particular, for the 
polynomial decay, using the tail bound fIB.lj) in Appendix Appendix B 



we 



can take m a = \-^-[\ to satisfy fl24l) . For the exponential decay, we can take 
m a = 1 for p G (0, |) and m a = 2 for p G (|, 1) to satisfy (J2*4"|) . 
Define the function 



^min{a„5 2 }, (25) 



7 One can replace sup^g^ with essential supremum with respect to 
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as well as the critical radius 

r n := mf{e > : Q n (e) < e 2 }. 



(26) 



Corollary 1. Suppose that r n > and 64 Cl m a log(2nr^) < 1. Then for 
any e 2 e [r£, <T\), we /icrae 



P 



-R^n (e) > (CV + C^ct) £^ 



< 2exp 



64 C^ 



(27) 



w/iere := 2(1 + C^) 2 and C a := 3(1 + C^)C a \\a\\i + 1. 



We provide the proof of this corollary in Appendix Appendix A As 
a concrete example consider a polynomial decay o~k = 0(k~ a ) for a > 1, 
which satisfies assumptions on {afc}. Using the tail bound ( IB.lj) in Ap- 
pendix Appendix B one can verify that r 2 = 0(n~ a '( a+1 '). Note that, in 
this case, 

r 2 log(2nr^) = 0(n~«+r logn^+i) = 0{n~~^i log ri) — > 0, n — > oo. 

Hence conditions of Corollary [TJ are met for sufficiently large n. It follows 
that for some constants C\, and C3, we have 

^n(Cin _ W7) < C 2 n-^TT 
1 

with probability 1 — 2 exp(— C^n^) for sufficiently large n. 

3.2.2. Sobolev kernel 

Consider the kernel K.(x, y) = min(x,y) defined on X 2 where X = [0, 1]. 
The corresponding RKHS is of Sobolev type and can be expressed as 

{/ E L 2 (X) I / is absolutely continuous, /(0) = and /' e L 2 (X)}. 

Also consider a uniform domain sampling operator E> x n, that is, that of ( fl2|) 
with Xj = i/n,i < n and let P be uniform (i.e., the Lebesgue measure 
restricted to [0, 1]). 

This setting has the benefit that many interesting quantities can be com- 
puted explicitly, while also having some practical appeal. The following can 
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be shown about the eigen-decomposition of the integral operator Ik intro- 
duced in Section [2J 



Ok 



(2k - 1)71-1-2 
2 J ' 

-2\ 



ipk(x) = a/2 sin (a k x), k = 1,2, 



In particular, the eigenvalues decay as o~k = 0(k ). 
To compute the we write 

1 ( (k — r)£ir (k + r — l)£% "1 , , 

*fcr = 0>k,A )* = -J2\ cos^ cos^ J —\. 28 

i=\ 

We note that \1/ is periodic in k and r with period 2n. It is easily verified 
that n~ x Yl!>=i cos(q£7i/n) is equal to —1 for odd values of q and zero for even 
values, other than q = 0, ±2n, ±4n, .... It follows that 

{1 + i if k - r = 0, 
iffc + r = 2n + l, (29) 
i(— l) fc ~ r otherwise 

for 1 < k,r < 2n. Letting I s G lR n be the vector with entries, (I s )j — 
< n, we observe that ^ n = I n + ^I s llj . It follows that A min (*„) = 
1. It remains to bound the terms in (|17|) involving the infinite sub-block 

The \I/ matrix of this example, given by (|29p . shares certain properties 
with the \1/ obtained in other situations involving periodic eigenf unctions 
{4>k}- We abstract away these properties by introducing a class of periodic 
\1/ matrices. We call a sparse periodic matrix, if each row (or column) is 
periodic and in each period only a vanishing fraction of elements are large. 
More precisely, ^/^ is sparse periodic if there exist positive integers 7 and 77, 
and positive constants C\ and c 2 , all independent of n, such that each row 
of is periodic with period jn. and for any row k, there exits a subset of 
elements Sk — {£1, ■ ■ ■ , £ v } C {1, . . . , •yn} such that 

*]fe,n+r| < ci, re S k , (30a) 

^kn+r| < c 2 n _1 , r e{l,...,jn}\S k , (30b) 

The elements of Sk could depend on fc, but the cardinality of this set should be 
the constant rj, independent of k and n. Also, note that we are indexing rows 
and columns of ^ by {n+1, n+2, . . . }; in particular, k > n+1. For this class, 



we have the following whose proof can be found in Appendix |Appendix B 
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Figure 2: Sparse periodic \E r matrices. Display (a) is a plot of the N-by-N leading principal 
submatrix of for the Sobolev kernel (s,t) H> mm{s,t}. Here n = 9 and N = 6n; the 
period is 2n — 18. Display (b) is a the same plot for a Fourier-type kernel. The plots 
exhibit sparse periodic patterns as defined in Section 13.2.21 

Lemma 1. Assume to be sparse periodic as defined above and o k = 
0(kr a ), a>2. Then, 

(a) for a>2, A max (£~ /2 *~4 /2 ) = 0{n^), n -> oo, 

(b) for a = 2, A max (4 /2 ^-Si /2 ) = 0( n - 2 logn), n oo. 

In particular (|29|) implies that ^ is sparse periodic with parameters 
7 = 2, 77 = 2, ci = 2 and c 2 = 1. Hence, part (b) of Lemma [T] applies. Now, 
we can use (TTTj) with p = n to obtain 

R& x n{e) < 2e 2 + 0(n~ 2 log n) (31) 
where we have also used (a + b) 2 < 2a 2 + 26 2 . 

3.2.3. Fourier-type kernels 

In this example, we consider an RKHS of functions on X = [0, 1] C M, 
generated by a Fourier-type kernel defined as K(x, y) := n(x—y), x,y € [0, 1], 
where 

00 

K (x) = ( Q + ^22( k cos{27ikx), are [-1,1]. (32) 

k=l 
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We assume that is a IR + -valued nonincreasing sequence in £ 1( i.e. J^ fe < 
oo. Thus, the trigonometric series in (1321) is absolutely (and uniformly) 
convergent. As for the operator $, we consider the uniform time sampling 
operator S^n, as in the previous example. That is, the operator defined 
in ( I12p with xi = i/n,i < n. We take P to be uniform. 

This setting again has the benefit of being simple enough to allow for 
explicit computations while also practically important. One can argue that 
the eigen-decomposition of the kernel integral operator is given by 

*Pi = 4 C \ *hk = $\ ^2k + i = 4 S \ k>l (33) 
0i = Co, <?2k = (k, o"2fc+i = Cfc, k > 1 (34) 

where ip^\x) := 1, ip^{x) := a/2 cos(27rfcx) and ip\f\t) := y/2 sin(27rA;;r) for 
k>l. 

For any integer k, let ([k]) n denote k modulo n. Also, let k i-> 5 fc be the 
function defined over integers which is 1 at k = and zero elsewhere. Let 
i := \/— 1. Using the identity rT 1 YH=\ Gxp(L2nk£/n) = 5p)) n , one obtains 
the following, 

(V4 C) ,^ C) )$ = [kk~j))n + <W))J (4=) " (35a) 

= %fe-i))„ - %+i))n, (35b) 
(^ c) , = 0, valid for all j,k>0. (35c) 

It follows that \l/ n = I n if n is odd and \l/ n = diag{l, 1, . . . , 1, 2} if n is even. 
In particular, A m i n (\I/ n ) = 1 for all n > 1. It is also clear that the principal 
submatrix of \1/ on indices {2, 3, ... } has periodic rows and columns with 
period 2n. If follows that \l/ n is sparse periodic as defined in Section 13.2.21 
with parameters 7 = 2, r\ = 2, c\ = 2 and C2 = 0. 

Suppose for example that the eigenvalues decay polynomially, say as (k = 
0(k~ a ) for a > 2. Then, applying f lT7|) with p = n, in combination with 
Lemma [T] part (a), we get 

Rs x n(e) < 2e 2 + 0(n- a ). (36) 

As another example, consider the exponential decay (k = p h , k > 1 for some 
p G (0, 1), which corresponds to the Poisson kernel. In this case, the tail sum 
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of {&k} decays as the sequence itself, namely, ^ fc>n crfc < 2j^ fc>n p fc = jz^p k - 
Hence, we can simply use the trace bound ( 1201 together with ( flTj) to obtain 

Rs x n(e) <2e 2 + 0(p n )- (37) 

4. Proof of Theorem [1] 

We now turn to the proof of our main theorem. Recall from Section 12.11 
the correspondence between any / 6 % and a sequence a G £2', also, recall 
the diagonal operator S : £ 2 — ^ ^2 defined by the matrix diagjai, 02, • • •}• 
Using the definition of ffl5l) of the ^ matrix, we have 

||/||| = (a, T}lHT}' 2 a)^ 

By definition of the Hilbert space H, we have ||/||^ = XlfcLi a fc an d 
1 1 /Ilia = J2k aka t- Letting Be 2 = jet E ^2 | ||«||^ 2 — 1} be the unit ball in 
£2, we conclude that R$ can be written as 

R${e) = sup (Q 2 (a) | Q*(a) < e 2 }, (38) 

where we have defined the quadratic functionals 

Q 2 ( a ) := {a,Ea) i2 , and Q*(a) := (a, £ 1/2 ^£ 1/2 a) fe . (39) 
Also let us define the symmetric bilinear form 

B$(a,P) := (a,£ 1/2 ^£ 1/2 /3)^ 2 , a,/?G£ 2 , (40) 

whose diagonal is B$( 

We now upper bound R$(e) using a truncation argument. Define the set 

C := {a e B, 2 I Q${a) < e 2 }, (41) 

corresponding to the feasible set for the optimization problem fl38|) . For each 
integer p = 1, 2, . . ., consider the following truncated sequence spaces 

T p := {a G £2 I Oii — 0, for all i > p}, and 
7^" := {a £ £2 I «i = 0, for all i = 1, 2, . . .p}. 
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Note that £ 2 is the direct sum of T p and . Consequently, any fixed a G C 
can be decomposed as a = £ + 7 for some (unique) £ G 7^ and 7 G Since 
£ is a diagonal operator, we have 

Q 2 («) = Q 2 (0 + Q 2 (7). 

Moreover, since any a G C is feasible for the optimization problem fl38|) . we 
have 

= Q*(£)+2fl*(£,7) + Q*(7) ^ ^ ( 42 ) 

Note that since 7 G 7p/~, it can be written as 7 = (0 P , c), where P is a vector 
of p zeroes, and c = (ci, c 2 , . . .) G Similarly, we can write £ = (x, 0) where 
s G M p . Then, each of the terms 7), ^$(7) can be expressed in 

terms of block partitions of E 1/ ' 2 \l/S 1/ ' 2 . For example, 

Q#(0 = (x,Ax) w , Q*(i) = (y,Dy) h , (43) 

where A := E^ 2 ^!^ 2 and -D := S~^ 2 ^ pS~^ 2 , in corr espondence with the 
block partitioning notation of Appendix Appendix F We now apply in- 
equality (1F.2P derived in Appendix Appendix F Fix some p 2 G (0, 1) and 
take 

«: 2 :=p 2 A max (S p i /2 %S p i /2 ), (44) 
so that condition (1F.5P is satisfied. Then, (1F.2P implies 

Q*(0 + 25*(f , 7) + Q*(7) > P 2 Q*(0 ~ T^hWl (45) 

1 — p z 

Combining ( )42p and (|45p . we obtain 



2 x ('V 1 / 2 vI/~V 1 / 2 "l 

Q*(0<~2 + rz^ 1' 7 " 2 - ( 46 ) 



We further note that < H7IH + ||f ||| = ||a||| < 1. It follows that 

g*(0 < where g 2 := ^ + ^7 / P ~ j . 

p z 1 — p z 



(47) 



17 



Let us define 

C -.= e B e . 2 nT p | Q*(0<?}- (48) 

Then, our arguments so far show that for a G C, 

Q 2 («) = Q 2 (0 + Q 2 ( 7 ) < sup Q 2 (0 + sup Q 2 ( 7 ) . (49) 

Taking the supremum over a G C yields the upper bound 

R*(e) <S P + S^. 

It remains to bound each of the two terms on the right-hand side. Begin- 
ning with the term Sp and recalling the decomposition 7 = (O p , c), we have 
$2(7) — YlkLi a k+ P c 2 k1 from which it follows that 

00 00 

Sp = SUp |^(T fe+p 4 I 5Z C ^ - X } = a P+^ 
k=l k=l 

since {<Tfc}^ 1 is a nonincreasing sequence by assumption. 

We now control the term S p . Recalling the decomposition £ = (x, 0) 
where x G M p , we have 

S p = sup Q 2 (£) = sup { (x, S p x) : (x, x) < 1, (x, Sj /2 ^ p Sj /2 x) < e 2 } 
= sup inf {{x,Z p x} +t(s 2 - {x,m 2 ^! p T}J 2 x))} 

(x,x)<l 

< inf { sup (x, T}J 2 {I P - tm p )T}J 2 x)+te 2 } 

*-° (x,x)<l 

where inequality (a) follows by Lagrange (weak) duality. It is not hard to 
see that for any symmetric matrix M, one has 

sup{(x,Mx) : {x, x) < 1} = max |0, A max (M)j. 

Putting the pieces together and optimizing over p 2 , noting that 

inf \* + -^} = (^ + Vb) 2 
re(o,i) Ir 1 - r J 
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for any a, b > 0, completes the proof of the bound ffTET) . 



We now prove bound f[T7l) . using the same decomposition and notation 
established above, but writing an upper bound on (^(a) slightly different 
form (pi9~l) . In particular, the argument leading to (H9l) . also shows that 



i2*(e) < sup {Q 2 (0 + Q 2 (7) I e + 7 e ^ a> < e 2 }. (50) 

Recalling the expression fl39l) for Q$(£) and noting that ^ A m in(^ r p)-^ 
implies A = Hp^typEp 2 ^ A min (\l/p)Ep, we have 

Q*(0 > A min (^ p )Q 2 (0. (51) 
Now, since we are assuming A m i n (^ p ) > 0, we have 



~2 

R*(e) < sup {q 2 (o + q 2 ( 7 ) e+7G^ 2 , g 2 (o < 7 £ ATf J - 



(52) 



The RHS of the above is an instance of the Fourier truncation problem with 
e 2 replaced with e 2 /A m i n (^ f p ). That problem is workout in detail in Ap- 



pendix Appendix E In particular, applying equation ( IE. 1}) in Appendix Appendix E 
with e 2 changed to e 2 / X m m(^f P ) completes the proof of ( 1TTI) . Figure [3] provides 
a graphical representation of the geometry of the proof. 

5. Conclusion 

We considered the problem of bounding (squared) L 2 norm of functions 
in a Hilbert unit ball, based on restrictions on an operator-induced norm 
acting as a surrogate for the L 2 norm. In particular, given that / G B-^ and 
II /III < £ 2 , our results enable us to obtain, by estimating norms of certain 
finite and infinite dimensional matrices, inequalities of the form 

\\f\\h <cie 2 + h* >H {<r n ) 

where {cx n } are the eigenvalues of the operator embedding % in L 2 , h$ t y,(') is 
an increasing function (depending on $ and %) and C\ > 1 is some constant. 
We considered examples of operators $ (uniform time sampling and Fourier 
truncation) and Hilbert spaces H (Sobolev, Fourier-type RKHSs) and showed 
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(a) 



(b) 



Figure 3: Geometry of the proof of (fl7|) . Display (a) is a plot of the set Q := 
{(Q2(a),Q<z(a)) : \\a\\e 2 = 1} C M 2 . This is a convex set as a consequence of Hausdorff- 
Toeplitz theorem on convexity of the numerical range and preservation of convexity under 
projections. Display (b) shows the set Q := conv(0, Q), i.e., the convex hull of {0} U Q. 
Observe that = sup{x : (x,y) € Q, y < e 2 }. For any fixed r € (0,1), the bound 

of (|17[) is a piecewise linear approximation to one side of Q as shown in Display (b) . 

that it is possible to obtain optimal scaling h^ t -n(a n ) = 0(a n ) in most of those 
cases. We also considered random time sampling, under polynomial eigen- 
decay a n = 0(n~ a ), and effectively showed that /i$^(cr n ) = (9(n _a//<Q+1 )) 
(for e small enough), with high probability as n — > oo. This last result com- 
plements those on related quantities obtained by techniques form empirical 
process theory, and we conjecture it to be sharp. 
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Appendix A. Analysis of random time sampling 

This section is devoted to the proof of Corollary [1] on random time sam- 
pling in reproducing kernel Hilbert spaces. The proof is based on an auxiliary 
result, which we begin by stating. Fix some positive integer m and define 




£] m) := inf < p : cr*. < e 2 



(A.l) 



k>p 



With this notation, we have 
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Lemma 2. Assume e 2 < &\ and 32 C 2 mv(s) \ogu(e) < n. Then, 

P{ i?s. ? (e) > <^ £ 2 + C CT <t„ (6) } < 2 exp ( - ^ ^j) • ( A - 2 ) 



We prove this claim in Section Appendix A. 2 below. 

Appendix A.l. Proof of Corollary^ 

To apply the lemma, recall that we assume that there exists m such that 
for all (large) p, one has 

^2 a k < Op. (A.3) 

k>p m 

and we let m a be the smallest such m. We define 

fi(e) := inf {p : a p < e 2 }, (A.4) 

and note that by (1A.3|) . we have v{e;m a ) < fi(e). Then, Lemma [2] states 
that as long as e 2 < o\ and 32C^m a /i(e) logyu(e) < n, we have 

P{^n(e) > (CV + a)e 2 } < 2exp ( - 3^2 (A.5) 

Now by the definition of ju(e), we have it, > e 2 for j < ^t(e), and hence 

since > 2 when e 2 < o\. One can argue that e i-> Q n {e)/e is nonincreas- 
ing. It follows from definition f l26|) that for e > r n , we have 

M.)<2n(^) 2 <2n(^) 2 <2nr 2 , 
which completes the proof of Corollary [TJ 
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Appendix A. 2. Proof of LemmalB 

For £ e MP, let £ <8> £ be the rank-one operator on M p given by 77 1 — ^> 
(£> > 77)2 ^- For an operator A on M p , let ||A|||2 denote its usual operator norm, 
I A 1 2 := supi| x || 2<1 ||Ac|| 2 . Recall that for a symmetric (i.e., real self-adjoint) 
operator A on W, l\A\\ 2 = sup{|A| : A an eigenvalue of A}. It follows that 
I A If 2 < a is equivalent to — al p ^ A ^ al p . 

Our approach is to first show that \\ty p — ip|||2 — \ for some properly cho- 
sen p with high probability. It then follows that X m m(^ p ) > \ and we can use 

bound (IT7|) for that value of p. Then, we need to control A max (£~ /2 \I/~£~ /2 ) . 
To do this, we further partition ty p into blocks. In order to have a consistent 
notation, we look at the whole matrix \1/ and let be the principal sub- 
matrix indexed by {{k — l)p + 1, . . . , (k — l)p + p}, for k — 1, 2, . . . ,p m_1 . 
Throughout the proof, m is assumed to be a fixed positive integer. Also, 
let vl/^ 00 ) be the principal submatrix of \I/ indexed by {p m + l,p m + 2, . . . }. 
This provides a full partitioning of ^ for which . . . , \I>( pm ) and ^>(°°) 
are the diagonal blocks, the first p m ~ x of which are p-by-p matrices and the 
last an infinite matrix. To connect with our previous notations, we note that 
= $> p and that . . . , ^(P m_1 ), #(°°) are diagonal blocks of ^ p . Let us 
also partition the £ matrix and name its diagonal blocks similarly. 

We will argue that, in fact, we have — J p | 2 < \ for all k = 

1, . . . ,p m_1 , with high probability. Let A p denote the event on which this 
claim holds. In particular, on event A p , we have ^ |l p for k = 

2, . . . , p m_1 ; hence, we can write 



A max (Sf%Sf ) < A m ax(V^^ fc) \/^) + A m ax( V / ^* (00) V / ^ 

fc=2 

< ~ X A -x(S (fc) ) +tr (^M*W^S) 



fc=2 

n m — 1 



3 

= 2 E ^(fc-ijp+i + X ^ [*]**■ ( A - 6 ) 

fc=2 fc>p m 

Using assumptions (|23|) on the sequence {cr^}, the first sum can be bounded 

as 

p m — l p m — l p m—l 

X a {k-i) P +\ < X — X ^ cr fc-i cr p — ^Iklli^p 
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Using the uniform boundedness assumption flA.lj) . we have [$]kk = n 1 Yli=i ^ki 3 
C^. Hence the second sum in (1A.6|) is bounded above by C| J2 k>pm &k- 

We can now apply Theorem [H Assume for the moment that e 2 > 
J2k> P ™ a k so that the right-hand side of ( 1A.6I) is bounded above by |(7 (T ||o'||iO'p+ 
C^e 2 . Applying bound ([IT]) , on event .Ap, with@ r = (1 + C^) -1 , we get 

Rn x n(e 2 ) < 2{r-V + (1 - r)" 1 (^C a \\a\\ l( r p + C}e 2 ^) } + a p+l 
= 2(1 + C^)V + 3(1 + C^ x )C a \\<T\\ x <T p + a p+1 . 
< C,/, £ 2 + C a o p 

where := 2(1 + C^) 2 and C CT := 3(1 + )C a \\a\\i + 1. To summarize, 
we have shown the following 



Event A p and 5 2 > Ok 

k>p m 



{e 2 )<C lp e 2 + C u a p . 



(A.7) 



It remains to control the probability of A p := H/Ui {lll^' ( '' C ' , — ^pllh < §}■ 
We start with the deviation bound on — J p , and then extend by union 
bound. We will use the following lemma which follows, for example, from 
the Ahlswede- Winter bound l8|, or from [9]. (See also [lfl, O, Q.) 



Lemma 3. Let £ 1; . . . , £ n be i.i.d. random vectors in M. p with E£i (g) £i = I p 
and ||£i ||2 < C p almost surely for some constant C p . Then, for 5 G (0, 1), 



P 



{\\ n 1 S^®^- / p| 2 >5 }^P ex P(-^) 



(A.8) 



Recall that for the time sampling operator, [^Vfc]? = T^V^O^) so that 
from (fT5l) . 



1 " 

= - y~\ ipk(xi)ipe(xi) 
n ' 

i=l 



8 We are using the alternate form of the bound based on (y/A + \AB) 2 
inf re( o, 1) {Ar- 1 + .B(l-r)- 1 }. 
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Let £j := (ipk{%i), 1 < k < p) G MP for i = 1, . . . , n. Then, {^} satisfy the 
conditions of Lemma In particular, letting denote the k-th standard 
basis vector of MP, we note that 

(e fc ,E(£i<g> £i)ee) 2 = E(e fc ,^) 2 (e^^) 2 = (^fc,^)i 2 = ^ 

and || || 2 < \ZpCti>, where we have used uniform boundedness of {ipk} as 
in (|22|) . Furthermore, we have \E f ^ 1 - ) = ^ _1 X]r=i£« ® £*■ Applying Lemma [3] 
with C p = y/pC^p yields, 

P{|||*W-J p ||| 2 >5} < P exp(--^ 2 --). (A.9) 

Similar bounds hold for v&W, = 2, . . . ,p m ~ 1 . Applying the union bound, 
we get 

~m — 1 

Pl| {|||* (fc) - I P h>6} < exp(mlogp- — 2-). 

For simplicity, let A = v4„ >p := n/(4C3,p). We impose mlogp < =| <5 2 so 
that the exponent in (1A.9|) is bounded above by — ^5 2 . Furthermore, for our 



purpose, it is enough to take S — ^. It follows that 



PW = P|J {t\* {k) -iph>h < ex P(-^5' (A.10) 
k=l 1 6 ^^p 

if 32C|mplogp < n. Now, by rfA"77]) . under e 2 > E/c> p ™ ^s,«( £2 ) > 
C!^ e 2 + C CT cr p implies .A^. Thus, the exponential bound in ( lA.lOj) holds for 



Pj-Rs^n (£ 2 ) > C^e 2 + C a Op} under the assumptions. We are to choose p and 
the bound is optimized by making p as small as possible. Hence, we take p 
to be v[e) := inf{p : e 2 > J2k>p m<7k } which proves Lemma [2j (Note that, 
in general, v(e) takes its values in {0,1,2,...}. The assumption e 2 < <j\ 
guarantees that v{e) ^ 0.) 

Appendix B. Proof of Lemma Q] 

Assume = Ck~ a , for some a > 2. First, note the following upper 
bound on the tail sum 

/oo 
x - a dx = C 1 (a)p 1 - a . (B.l) 
,, 
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Furthermore, from the bounds (I30al) and (130bl) . we have, for k > n + 1, 

<min{ Cl ,c 2 }. (B.2) 

To simplify notation, let us define := {1,2,..., ^n}. 

Consider the case a > 2. We will use the ^oo~^oo upper bound of f[2~Tj) . 
with p = n. Fix some fc > n + 1. Note that < c n+1 . Then, recalling the 
assumptions on \l> and the definition of Sk, we have 



oo 7" 

\l~Vk\fo~l I [^]m| — V^rt+l \A>"n+r+g7n| [^]fc,n+r+./- » 

£>n+l g=0 r=l 

oo 7n 

= y/^Wfl ^ y /cr n+r+g7n|[^]fc, 
g=0 r=l 



OC 



5=0 r e S k re /„\S fe 

(B.3) 

Using ( IB.ip . the second double sum in f ]B.3j) is bounded by 

oo 

E V^+r+^n < Ev^< G,(a) n 1 -"/ 2 . (B.4) 

Recalling that S& C /„ and \Sk\ = f], the first double sum in (1B.3j) can be 
bounded as follows 



q=0 reS k g=0 reS k 



-a/2 



9=0 r 6 5 fc 



•a/2 



< v / Cr ? ^(l + g 7 )- a/ V Q/2 



g=0 



9=1 



-a/2^ ? -a/2\ -a/2 



C 3 (a, 7 ^)n- a/2 (B.5) 
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where in the last line we have used Yl^Li Q a ^ 2 < oo due to a/2 > 1. Com- 
bining ( IB.3j) . (1B.4j) and ( 1B.5[) and noting that ^<J n +i < \/~Cn~ a l 2 , we obtain 



(B.6) 



£>n+l 



Taking supremum over k > 1 and applying the ^oo~^oo bound of fT2T|) . with 
p = n, concludes the proof of part (a). 

Now, consider the case a = 2. The above argument breaks down in this 
case because Yl^Li Q~ a ^ 2 does not converge for a = 2. A remedy is to further 

1/2 1/2 

partition the matrix S~ . Recall that the rows and columns of this 

matrix are indexed by {n + l,n + 2,...}. Let A be the principal submatrix 
indexed by {n+1, n+2, . . . , n 2 } and D be the principal submatrix indexed by 
{n 2 + l, n 2 + 2, . . .}. We will use a combination of the bounds (I30al) and (130bj) . 
and the well-known perturbation bound \m&x[(c T £>)] — A m ax(^4) + A max (D), 
to write 

A max (£i /2 ^~4 /2 ) < A max (A) + A max (L>) < Halloo + tr(D). (B.7) 
The second term is bounded as 

tr(£>) = ^ a k [*]fcfc ^ min{ci, c 2 } a fc = min{ci, c 2 } (n 2 ) 1-2 = 6*5(7) n~ 2 -, 

k>n 2 k>n 2 

(B.8) 



where we have used (IB.ip and (1B.2[) . To bound the first term, fix k G 
{n + 1, . . . , n 2 }. By an argument similar to that of part (a) and noting that 
7 > 1, hence 7n 2 > n 2 , we have 



n jn 

\[&k\[&l I [^]m| — V a n+1 -\/ Vn+r+q-ynl [^]fc,rt+r 

?=n+l g=0 r=l 



g=o r e 5 fc re J„\s fe 

(B.9) 
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Using 7 > 1 again, the second double sum in (IB.9[) is bounded as 

" 37n 2 37T1 2 



1=0 r£l n \S k e=n+l 



< J2 v^< v / C^^<v / Clog(3 7 n 2 )<C 6 ( 7 )log 

(B.10) 



n. 



for sufficiently large n. Note that we have used the bound 
J? x _1 dx = log p. The first double sum in (1B.9P is bounded as follows 



" 2 r" < 



9=0 r G S k 



a 



n+r+q^n 



\/C7 2J /J (n + r + g 7 n) 1 

9=0 r e 5 fc 
n 

< v^r? ^ (1 + g 7 )~ V 1 

9=0 

n 

< VCr)(l + 7 " 1 + 7 " 1 5^g" 1 )^ 1 



9=2 



C 7 (l,v) n 1 log 71, 



(B.ll) 



for n sufficiently large. Combining (1B.9|) . (IB.lOj) and (IB. 1 f j) . taking supre- 
mum over k and using the simple bound y/a n+ i < \fCrT l , we get 



I Alloc < v / Cn- 1 {c 1 C 7 ( 7 ^) ^ + -C 6 ( 7 ) logn} = C 8 ( 7 ,r/) 



logn 



(B.12) 



which in view of (1B.8j) and ( 1B.7|) completes the proof of part (b). 



Appendix C. Relationship between R$(e) and T^(e) 

In this appendix, we prove the claim made in Section [T] about the relation 
between the upper quantities and T$ and the lower quantities and 
R^. We only carry out the proof for the dual version holds for T$. To 
simplify the argument, we look at slightly different versions of _R$ and T $ , 
defined as 

Rl(e) := sup{|m|£ 2 : f E B n , ||/||| < s 2 }, (C.l) 
T%(5) := inf {\\f\\l : f e B n , \\f\\ 2 L2 >5 2 } (C.2) 
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and prove the following 



Rl'\5)=Tl(5) (C.3) 
where R%~ 1 (5) := inf{e 2 : R%(e) > 5 2 } is a generalized inverse of To 



see (1C.3|) . we note that R$(e) > 5 2 iff there exists / 6 5-^ such that ||/||| < e 2 
and ||/||| 2 > 8 2 . But this last statement is equivalent to T%(5) < e 2 . Hence, 



R%-\5) = mi{e 2 : T%(8) < e 2 } (C.4) 



which proves flC.3j) 



Using the following lemma, we can use relation (10.3|) to convert upper 
bounds on R& to lower bounds on T$- 

Lemma 4. Let t 1— > p(t) be a nondecreasing function (defined on the real line 
with values in the extended real line.). Let q be its generalized inverse defined 
as q(s) := inf{t : p(t) > s}. Let r be a properly invertible (i.e., one-to-one) 
function such that p(t) < r(t), for all t. Then, 

(a) q(p(t)) > t, for all t, 

(b) q(s) > r _1 (s) ; for all s. 

Proof. Assume (a) does not hold, that is, infjo; : p(a) > p{t)} < t. Then, 
there exists ao such that p(ao) > p(t) and ao < t. But this contradicts p(t) 
being nondecreasing. For part (b), note that (a) implies t < q(p(t)) < q(r(t)), 
since q is nondecreasing by definition. Letting t := r _1 (s) and noting that 
r(r _1 (s)) = s, by assumption, proves (b). □ 

Let p = i?0, q = and r(t) = At + B for some constant A > 0. Noting 
that R% < R<& and T $ (- + 7) > T^, for any 7 > 0, we obtain from Lemma H] 
and fjCl3l that 

S 2 

R$(e)<As 2 + B T lS> (5+)> —- B, (C.5) 

J\ 

where T$(<5+) denotes the right limit of T $ as 5 2 . This may be used to 
translate an upper bound of the form ffTTl) on i?$ to a corresponding lower 
bound on T $ . 
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Appendix D. The 2x2 subproblem 

The following subproblem arises in the proof of Theorem [TJ 

F(e*) := sup { (r s) ("J °) Q : + S * < 1, (r .) ("J °) Q < 

> » ' > „ ' 

=: x(r,s) =: y(r,s) 

(D.l) 

where u 2 ,v 2 , a 2 and <i 2 are given constants and the optimization is over (r, s). 
Here, we discuss the solution in some detail; in particular, we provide explicit 
formulas for F(e 2 ). Without loss of generality assume u 2 > v 2 . Then, it is 
clear that F(e 2 ) < u 2 and F(e 2 ) = u 2 for e 2 > u 2 . Thus, we are interested 
in what happens when e 2 < u 2 . 

The problem is easily solved by drawing a picture. Let x(r, s) and y(r, s) 
be as denoted in the last display. Consider the set 

S := { (x(r, s), y(r, s)) : r 2 + s 2 < 1} 

= {r V, a 2 ) + s 2 (v 2 , d 2 ) + g 2 (0, 0) : r 2 + s 2 + q 2 = 1} 

= conv{(M 2 ,a 2 ), (v 2 ,d 2 ), (0,0)}. (D.2) 

That is, S is the convex hull of the three points (u 2 ,a 2 ), (v 2 ,d 2 ) and the 
origin (0, 0). 

Then, two (or maybe three) different pictures arise depending on whether 
a 2 > d 2 (and whether d 2 > v 2 or d 2 < v 2 ) or a 2 < d 2 ; see Fig. ID. 41 It follows 
that we have two (or three) different pictures for the function e 2 t— > F(e 2 ). 
In particular, for a 2 > d 2 and d 2 < v 2 , 

F{e 2 ) = v 2 min i} + ( u 2 - v 2 ) max {o, ^^}, (D.3) 
for a 2 > d 2 and d 2 > v 2 , F(e 2 ) = e 2 , and for a 2 < d 2 , 

F(e 2 ) =M 2 min|^,l|. 
All the equations above are valid for e 2 G [0, Oi]. 
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d 2 a 2 a 2 d 2 



Figure D.4: Top plots illustrate the set S as defined in (|D.2j) . in various cases. The bottom 
plots are the corresponding e 2 n- F(e 2 ). 

Appendix E. Details of the Fourier truncation example 

Here we establish the claim that the bound (|T9|) holds with equality. 
Recall that for the (generalized) Fourier truncation operator T^n, we have 

oo oo n 

Rf n {e 2 ) = sup | k®l ■ OL 2 k < 1, ^ a k a\ < e 2 | 

k=l k=l k=l 

Let a = (t£, s 7 ), where t, s G R, £ = . . . ,£ n ) e R», 7 = (71, Ta • ■ •) e h 
and HCH2 = 1 = \\lh- Let w 2 = u 2 (£) := ELi and ^ = u2 (7) := 

Let us fix £ and 7 for now and try to optimize over t and s. That is, we 
look at 

G(e 2 ; f , 7) := sup jtV + A 2 : t 2 + s 2 < 1, tV < £ 2 }. 

This is an instance of the 2-by-2 problem fID.ip . with a 2 = u 2 and d 2 = 0. 
Note that our assumption that u 2 > v 2 holds in this case, for all £ and 7, 
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because {cr fc } is a nonincreasing sequence. Hence, we have, for e 2 < a\, 

G (A(n) = « 2 + <« 2 - ^ = « 2 <7) + (i - v ^y. 

Now we can maximize G(e 2 ;£,7) over £ and then 7. Note that G is 
increasing in u 2 . Thus, the maximum is achieved by selecting u 2 to be 
sup|| $ || 2=1 w 2 (£) = 01. Thus, 

supG(e 2 -Z, 1 ) = (l--)v 2 ( 1 )+e 2 . 

For e 2 < cti, the above is increasing in t> 2 . Hence the maximum is achieved 
by setting v 2 to be supii 7 [| 2=1 v 2 {j) = a n+ \. Hence, for e 2 < o\ 

R^ n (e 2 ):=supG(e 2 -£, 1 )= (l - ^+l) £ 2 + a n+1 . (E.l) 

Appendix F. An quadratic inequality 

In this appendix, we derive an inequality which will be used in the proof 
of Theorem [TJ Consider a positive semidefinite matrix M (possibly infinite- 
dimensional) partitioned as 

Assume that there exists p 2 G (0, 1) and k 2 > such that 

(c Ml _ p2) B + K2/ )tO. (F.l) 

Let (x, y) be a vector partitioned to match the block structure of M. Then 
we have the following. 

Lemma 5. Under liF. for all x and y, 

k 2 

x T Ax + 2x T Cy + y T Dy > p 2 x T Ax-- -\\y\\ 2 2 . (F.2) 
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Proof. By assumption (IF. II) . we have 



(F.3) 
□ 



Writing ( IF.ip as a perturbation of the original matrix, 

A C\ fO \ n , N 

^J + (o -p^ + ^J- ' (R4) 

we observe that a sufficient condition for (IF. 1|) to hold is p 2 D ^ /t 2 /. That 
is, it is sufficient to have 

p 2 A max (D) < n 2 . (F.5) 

Rewriting (IF. 1[) differently, as 



l-p 2 )A \ (p 2 A C 
(1-P 2 )DJ + I C T /t 2 / 



b 0, (F.6) 



we find another sufficient condition for ( IF.lj) . namely, p 2 A — k 2 CC t >z 0. 
In particular, it is also sufficient to have 

«r 2 A max (CC T ) < p 2 A min (A). (F.7) 
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